{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Installation\n",
    "! pip install smolagents\n",
    "# To install from source instead of the last release, comment the command above and uncomment the following one.\n",
    "# ! pip install git+https://github.com/huggingface/smolagents.git"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Text-to-SQL"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在此教程中，我们将看到如何使用 `smolagents` 实现一个利用 SQL 的 agent。\n",
    "\n",
    "> 让我们从经典问题开始：为什么不简单地使用标准的 text-to-SQL pipeline 呢？\n",
    "\n",
    "标准的 text-to-SQL pipeline 很脆弱，因为生成的 SQL 查询可能会出错。更糟糕的是，查询可能出错却不引发错误警报，从而返回一些不正确或无用的结果。\n",
    "\n",
    "👉 相反，agent 系统则可以检视输出结果并决定查询是否需要被更改，因此带来巨大的性能提升。\n",
    "\n",
    "让我们来一起构建这个 agent! 💪\n",
    "\n",
    "首先，我们构建一个 SQL 的环境："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sqlalchemy import (\n",
    "    create_engine,\n",
    "    MetaData,\n",
    "    Table,\n",
    "    Column,\n",
    "    String,\n",
    "    Integer,\n",
    "    Float,\n",
    "    insert,\n",
    "    inspect,\n",
    "    text,\n",
    ")\n",
    "\n",
    "engine = create_engine(\"sqlite:///:memory:\")\n",
    "metadata_obj = MetaData()\n",
    "\n",
    "# create city SQL table\n",
    "table_name = \"receipts\"\n",
    "receipts = Table(\n",
    "    table_name,\n",
    "    metadata_obj,\n",
    "    Column(\"receipt_id\", Integer, primary_key=True),\n",
    "    Column(\"customer_name\", String(16), primary_key=True),\n",
    "    Column(\"price\", Float),\n",
    "    Column(\"tip\", Float),\n",
    ")\n",
    "metadata_obj.create_all(engine)\n",
    "\n",
    "rows = [\n",
    "    {\"receipt_id\": 1, \"customer_name\": \"Alan Payne\", \"price\": 12.06, \"tip\": 1.20},\n",
    "    {\"receipt_id\": 2, \"customer_name\": \"Alex Mason\", \"price\": 23.86, \"tip\": 0.24},\n",
    "    {\"receipt_id\": 3, \"customer_name\": \"Woodrow Wilson\", \"price\": 53.43, \"tip\": 5.43},\n",
    "    {\"receipt_id\": 4, \"customer_name\": \"Margaret James\", \"price\": 21.11, \"tip\": 1.00},\n",
    "]\n",
    "for row in rows:\n",
    "    stmt = insert(receipts).values(**row)\n",
    "    with engine.begin() as connection:\n",
    "        cursor = connection.execute(stmt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 构建 agent"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在，我们构建一个 agent，它将使用 SQL 查询来回答问题。工具的 description 属性将被 agent 系统嵌入到 LLM 的提示中：它为 LLM 提供有关如何使用该工具的信息。这正是我们描述 SQL 表的地方。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "inspector = inspect(engine)\n",
    "columns_info = [(col[\"name\"], col[\"type\"]) for col in inspector.get_columns(\"receipts\")]\n",
    "\n",
    "table_description = \"Columns:\\n\" + \"\\n\".join([f\"  - {name}: {col_type}\" for name, col_type in columns_info])\n",
    "print(table_description)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```text\n",
    "Columns:\n",
    "  - receipt_id: INTEGER\n",
    "  - customer_name: VARCHAR(16)\n",
    "  - price: FLOAT\n",
    "  - tip: FLOAT\n",
    "```\n",
    "\n",
    "现在让我们构建我们的工具。它需要以下内容：（更多细节请参阅[工具文档](https://huggingface.co/docs/smolagents/main/zh/examples/../tutorials/tools)）\n",
    "\n",
    "- 一个带有 `Args:` 部分列出参数的 docstring。\n",
    "- 输入和输出的type hints。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from smolagents import tool\n",
    "\n",
    "@tool\n",
    "def sql_engine(query: str) -> str:\n",
    "    \"\"\"\n",
    "    Allows you to perform SQL queries on the table. Returns a string representation of the result.\n",
    "    The table is named 'receipts'. Its description is as follows:\n",
    "        Columns:\n",
    "        - receipt_id: INTEGER\n",
    "        - customer_name: VARCHAR(16)\n",
    "        - price: FLOAT\n",
    "        - tip: FLOAT\n",
    "\n",
    "    Args:\n",
    "        query: The query to perform. This should be correct SQL.\n",
    "    \"\"\"\n",
    "    output = \"\"\n",
    "    with engine.connect() as con:\n",
    "        rows = con.execute(text(query))\n",
    "        for row in rows:\n",
    "            output += \"\\n\" + str(row)\n",
    "    return output"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们现在使用这个工具来创建一个 agent。我们使用 `CodeAgent`，这是 smolagent 的主要 agent 类：一个在代码中编写操作并根据 ReAct 框架迭代先前输出的 agent。\n",
    "\n",
    "这个模型是驱动 agent 系统的 LLM。`InferenceClientModel` 允许你使用 HF  Inference API 调用 LLM，无论是通过 Serverless 还是 Dedicated endpoint，但你也可以使用任何专有 API。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from smolagents import CodeAgent, InferenceClientModel\n",
    "\n",
    "agent = CodeAgent(\n",
    "    tools=[sql_engine],\n",
    "    model=InferenceClientModel(model_id=\"meta-llama/Meta-Llama-3.1-8B-Instruct\"),\n",
    ")\n",
    "agent.run(\"Can you give me the name of the client who got the most expensive receipt?\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Level 2: 表连接"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在让我们增加一些挑战！我们希望我们的 agent 能够处理跨多个表的连接。因此，我们创建一个新表，记录每个 receipt_id 的服务员名字！"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "table_name = \"waiters\"\n",
    "receipts = Table(\n",
    "    table_name,\n",
    "    metadata_obj,\n",
    "    Column(\"receipt_id\", Integer, primary_key=True),\n",
    "    Column(\"waiter_name\", String(16), primary_key=True),\n",
    ")\n",
    "metadata_obj.create_all(engine)\n",
    "\n",
    "rows = [\n",
    "    {\"receipt_id\": 1, \"waiter_name\": \"Corey Johnson\"},\n",
    "    {\"receipt_id\": 2, \"waiter_name\": \"Michael Watts\"},\n",
    "    {\"receipt_id\": 3, \"waiter_name\": \"Michael Watts\"},\n",
    "    {\"receipt_id\": 4, \"waiter_name\": \"Margaret James\"},\n",
    "]\n",
    "for row in rows:\n",
    "    stmt = insert(receipts).values(**row)\n",
    "    with engine.begin() as connection:\n",
    "        cursor = connection.execute(stmt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "因为我们改变了表，我们需要更新 `SQLExecutorTool`，让 LLM 能够正确利用这个表的信息。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "updated_description = \"\"\"Allows you to perform SQL queries on the table. Beware that this tool's output is a string representation of the execution output.\n",
    "It can use the following tables:\"\"\"\n",
    "\n",
    "inspector = inspect(engine)\n",
    "for table in [\"receipts\", \"waiters\"]:\n",
    "    columns_info = [(col[\"name\"], col[\"type\"]) for col in inspector.get_columns(table)]\n",
    "\n",
    "    table_description = f\"Table '{table}':\\n\"\n",
    "\n",
    "    table_description += \"Columns:\\n\" + \"\\n\".join([f\"  - {name}: {col_type}\" for name, col_type in columns_info])\n",
    "    updated_description += \"\\n\\n\" + table_description\n",
    "\n",
    "print(updated_description)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "因为这个request 比之前的要难一些，我们将 LLM 引擎切换到更强大的 [Qwen/Qwen3-Next-80B-A3B-Thinking](https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Thinking)！"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sql_engine.description = updated_description\n",
    "\n",
    "agent = CodeAgent(\n",
    "    tools=[sql_engine],\n",
    "    model=InferenceClientModel(model_id=\"Qwen/Qwen3-Next-80B-A3B-Thinking\"),\n",
    ")\n",
    "\n",
    "agent.run(\"Which waiter got more total money from tips?\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "它直接就能工作！设置过程非常简单，难道不是吗？\n",
    "\n",
    "这个例子到此结束！我们涵盖了这些概念：\n",
    "\n",
    "- 构建新工具。\n",
    "- 更新工具的描述。\n",
    "- 切换到更强大的 LLM 有助于 agent 推理。\n",
    "\n",
    "✅ 现在你可以构建你一直梦寐以求的 text-to-SQL 系统了！✨"
   ]
  }
 ],
 "metadata": {},
 "nbformat": 4,
 "nbformat_minor": 4
}
